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© An Instruction buffer for a digital computer oon- 
troJe the flow of Instruction stream to an Instruction 

3 decoder (32). As each instruction is consumed, a 
shifter (70) removes (ho consumed bytes and capos!* 
if, I, , ,f nl n m * * « * - ait— I,, ,m mt i 1 1 1 ■ n nnJ 
eons tne remaining oyiee imo me uwesi oraer posr- 

Y toned. The byte posi t ions loft amply era filled by 
M instruction stream bytes retrieved torn one of a pea* 

of prefetch buffers (64, 66) or from a virtual Instruct 
Oflon cache (28). One prefetch buffer (88) is filled 
(Tjfnom the (nstiulton cache (28) after being ampMed* 

but prior to those particular bytes being requested to 
the instruction decoder (32). The two level 

— — -A, — fc— 1 — ^ ^M^^-^aa* SaWaak — ■— 8L_ — aA_ - — * — ^«^JMtd*A 40*4 

preretcnuiy ssows me rewvory siow process or 
cache access to do performed aurtng noncrtncai 
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INSTRUCTION BUFFER 8YSTEM FOR A DIGITAL COMPUTER 



This invention relates generally to an instruo 
tion buffer system and to a virtual Instruction cache 
(VIC) of a high-speed digital computer. 

In the field of high speed computers, most 
advanced computers pipeline the entire sequence 
of instruction activities. A prime example is the 
VAX 8600 computer manufactured and sold by 
Digital Equipment Corporation. 111 PowdermiO 
Road, Maynard MA 971 54-1 41 a The instruction 
pipeline for the VAX 8800 Is described in T. Fos- 
sum etal. "An overview of the VAX 8600 System". 
Digital Technical Journal, No. 1. August 1985. pp. 
8-23. Separate pipeline stages are provided tor 
instruction fetch, instruction decode, operand ad* 
dress generation, operand fetch, instruction ex- 
ecute, and result store. 

To make effective use of this pipelining capa- 
bility, it is desirable to keep each stage of the 
pipeline occupied, performing He intended function 
on the next instruction to be executed, to order to 
do this, the instruction fetch stage must retrieve an 
instruction and pass it to the next stage between 
each transition of the system dock Otherwise, 
such a disruption In the instruction stream causes 
the pipeline to drain, nece ssitating a timeconsum- 
ing restart of the entire pipefin*. Of course, the 
purpose of the pipeline is to Increase the overall 
speed of the computer. Thus, it is highly advanta- 
geous to avoid these situations where the pipeline 
is interrupted. 

However, the instruction set employed in some 
computers is of the variable length type, thereby 
forcing the instruction buffer to have added com- 
plexity. In other words, until the instruction 
(opcode) is decoded, the instruction buffer does 
not "know" how many of the subsequent bytes of 
the instruction stream belong with the current in- 
struction. Therefore, the instruction buffer can only 
respond by loafing a pres el ected number of bytes 
of the instruction stream, which may or may not 
include an entire instruction. The instruction de- 
coder will only consume those bytes associated 
with the immediate instruction. Thereafter, the in- 
struction buffer must determine how many of the 
present bytes were used by the decoder, shift the 
unused bytes into the lowest order locations, and 
then fill the empty buffer locations with subsequent 
bytes of the instruction stream. 

Reference to the main memory to retrieve 
these subsequent bytes of instruction stream nec- 
essarily involves multiple dock cycles. To avoid 
accessing main memory, many digital computers 
include a high speed cache between the process- 
ing unit and the main memory. Access to this 
cache takes only a smaJJ number of cycles of the 



processor's clock but often involves translating vir- 
tual addresses to physical addresses. To further 
accelerate the access to the instruction stream, 
some systems dedicate a cache solely to store the 
5 Instructions. The access to this "instruction cache" 
often does not entail translating from virtual to 
physical addresses as the instructions are stored 
under their virtual addresses. TMs access to the 
Instruction stream in a high speed virtual instruction 

to cache may only involve one cyde of the proces- 
sor's dock. The virtual instruction cache, however, 
contains only a portion of the main memory, each 
reference to the virtual instruction cache involves 
comparing the requested address with the desired 

is address to first determine if the desired instruction 
stream is present and then retrieving the requested 
instruction stream. Therefore, owing to the variable 
length nature of the instruction set the instruction 
buffer cannot predict whether a reference to the 

» VIC wiB be required by the instruction currently 
being decoded. 

To prevent numerous references to the virtual 
instruction cache, a prefetch buffer is provided to 
maintain a preselected number of the subsequent 

» bytes of instruction stream which are expected to 
be used by the instruction decoder. This process 
forestalls the inevitable reference to the virtual In- 
struction cache. 

Since the virtual instruction cache contains only 

so a portion of the instruction stream, refills to the 
instruction buffer can result In "misses* In the 
virtual Instruction cache, which require fetches from 
the main memory. These main memory fetches 
generally require many clock cycles, thereby inter* 

38 rupting the pipefine. 

To ensure that the instruction pipefine of a 
digital computer remains full to provide tor last and 
efficient execution of the instructions, an instruction 
buffer includes first and second prefetch buffers tor 

40 storing a prese l ect e d number of subsequent bytes 
of Instruction stream. The first prefetch buffer is 
independently addressable to retrieve a selected 
number of sequential bytes contained therein. 
Means are provided tor refilling the decoder with 

49 the number of sequential bytes of instruction 
stream corre s ponding to the number of bytes cur- 
rently being decoded. The refOI means retrieves 
the instruction stream bytes from the first prefetch 
buffer sequentially and sets a 'Valid bit* corne- 
as sponding to each byte of Instruction stream re- 
trieved. The second Instruction buffer need only 
contain all vald or all invafid bytes, and therefore 
only one vafid bit need be held for the second 
instruction buffer. The first prefetch buffer Is refilled 
with the preselected number of subsequent Instrw 
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tfofi stream bytes In response to all of the vatid bits 
corresponding to oach byta of tto instruction 
stream containad therein being dear. SmBarty. the 
Second prefetch buffer is refined with the presetec- 
ted number of s ubse qu ent Instruction stream bytes 
when it becomes amply. 

Other objects and advantages of the Invention 
wffl become apparent upon reading the tallowing 
detailed description and upon reference to the 
orawaiga m wracn: 

HQ. 1 is a top level block diagram of a 
ponton ot a central proces si ng trot and associated 
memory* 

Rg. 2 is a functional diagram of the pipeline 
processing of a longword ADD operand; 

RQ. 3 is a block diagram of the virtual 

FKL 4 is a general block diagram of the 
instruction buffer I nt er f aced with the virtual instruc- 
tion cache; 

FIG. 5 la a detailed block dtagram of the 
instruction buffer and the interface to the Instruction 

FML 6 la a schematic dtagram of the shifter 
of the instruction buffer; 

FWL 7 la a schematic diagram of the rotator 
of the instruction buffer; 

RQ. 8 is a schematic diagram of the merge 
mufti plexor of the Instruction buffer; and 

FKJ. 9 la a block cfiagram of tie two-unit 
vafcf block store strame of the virtual inatruction 
cachOi 

While the Invention is susceptible to various 
modifications and alternative forms, specific em* 
bodmonts thereof have been shown by way of 
example in ma arawmga ana wu nerem oe oih 
scnoeo m aeon, rc snouia oe unoennooa, nowovor, 
that It la not intended to Omit the invention to the 

afk^a^Al^M Jaii LmmA ffla^a»ia^ a* di J bWa ma% » aaa - * — - . AaW^h 

pa ra c ui a r forme c ia c ioao o, out on me contrary, ma 
intention is to cover afl mocfiffcationa, equivalents, 

* nil ■■■■ ■lii it ■ tm fJtn r» fllit ill ■ - - I SM * 

ana anemanvee laisng wnrun me span ana scope 

«wf tim In inrtrm -* ■ H, , 1 1 

or me mvenaon as oennea oy me apponaeo 
un ite. 

Turning now to the drawings, FIGURE 1 is a 
top level block dtagram of a portion of a pipelined 
computer system 10. The system 10 includes at 
toast one central processing unit (CPU) 12 having 
access to main memory 14 it should be under- 
stood that adcftional CPUs couW be ueed In such a 
system by sharing the main memory 14 

Inside the CPU 12, the execution of an Individ- 
ual instruction is broken down into multiple smaller 

^kW^kWmaWW flkAA* &*m\*Wa%m\*W AAA Mkj^Au^UL^ iWa. m dAaa^Jtt^hh^bAa*.^ 

man. ineee ibsxs are penormea oy aeocatoo, 
separate, i n dependent functional units that are op* 

Although each instruction ultimately performs a 
arnersnr operation, many or me smaser taaxe tmo 
wrocn eacn waaucoon ts oroson are common d aa 



Instructions, QeneraOy. the tottowtng steps are per- 
formed during the execution of an instruction: la> 
struction fetch. Instruction decode, operand fetch, 
execution, and result store* Thus, by the use of 
9 hart,wa,a * a B e * the steps can be over- 

lapped, thereby Increasing the total Instruction 
throughput 

The data path through the pipefine includea a 
respective set of registers for transferring the re* 

w suits of each pipeline stage to the next pipeline 
stage. Those transfer registers are cl ocked in re- 
sponse to a common system dock. For example, 
during a first dock cycle, the first instruction ta 
letcneu oy neraware aeoicanxj to instruction tetcn. 

r s During the second dock cycle, the fetched Instruc- 
tion Is transferred and decoded by instruction de- 
code hardware, but at the same time, the next 
tnaeucaon ta leasnoa oy via instruction fetch narch 
ware. During the third clock cycle, each instruction 

20 ta shifted to the next stage of the p ipetin e and a 
now Inatruction ts fetched* Thus, altar the pipeline 
la filled, an instruction wil be completely executed 
at the end of each clock cycle. 

Thia process is analogous to an asaarnbly One 

2S in a manufacturing environment Each worker to 
d edic ate d to performing a single task on every 
product that pasaoa through life or her work stage. 
Aa each task Is performed the product comae 

— * W n n n i — »1 SA it, - * -« - " 

cioaer w compieuon. n me nnai stage, eacn amo 

so the worker performs his or her assigned task a 
completed product rods off the assembly Hne. 

Ae shown In RQ. 1, each CPU 12 Is partitioned 
into at least three fonctfonal unite: the memory 
acc e ss unit 1ft the instruction emit 18. end the 

as execution unit 29. 

The memory access unit 18 Includea a main 
cacne £c wntcn, on an average oasts, enaoios tne 
Instruction and execution unite 18, 20 to process 
data at a fester rata than the access time of the 

40 main memory 14w This cache 22 Includes mesne 
■or somg sewciBQ preoanneo otocxa or oata eia- 
mental means for receiving requests from the In- 
struction unit 18 via a translation buffer 24 to ac- 
cess a specified data element means for checking 

4o wneoier me oaxa eiemenc is si a dock stereo m me 
cache 22, and means operative when data for the 
uxxx incfuovig ne apecmea can element ta not so 
scorea tot reeamg me spectnea diock or data in tne 
cache 22. In other worde. the cache provides a 

oo window uwo ma reaai marnoryi ana oorname Qui 
likely to be needed by the Instruction and execu- 
tion units 18, 20. The organization and operation of 

a—l— tt— — fc — -* ftuM ii ■jalfn ft LiJka - -- Mm aa 

sanaar cacne ana rananpon ouner are runner 

described in Chaptar 11 of Levy and EcMtouea, Or, 

ss Computer Pr ooremrnfog and ArchNacture. The 

^^^LTTSj^mMToipS^, pp. xT- 

388(1880). 

If i data oiement needed by the Inetructfon and 
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execution units 18, 20 is not (bund in the cache 22. 
then the data element is obtained from the main 
memory 14. but in the proces s, an entire block, 
inducing additional dais, is obtained from the main 
memory 14 and written into thar cache 22. Due to 
the principle of tocaHty in time and memory space, 
the next time the instruction and execution units 
desire a data element there is a high degree of 
fikefihood that this data element will be found in the 
block which indudes the previously addressed data 
element Consequently, it is probable that the 
cache 22 will already include the data element 
required by the Instruct i on and execution units 18, 
20. In general, since the cache 22 is accessed at a 
much higher rate than the main memory 14, the 
mam memory 14 can have a proport i o n ally slower 
access time than the cache 22 without substantially 
degrading the average performance of the com* 
puter system 10. Therefore, the main memory 14 
is constructed of slower and less expensive mem* 
ory elements. 

The translation buffer 24 is a high speed asso- 
ciative memory which stores the most recently 
used virtual-to-physical address translations. In a 
virtual memory system, a reference to a single 
virtual address can cause several memory referen- 
ces before the desired information la made avail- 
able. However, where the translation buffer 24 is 
used, translation is reduced to simply finding a 
•hit" in the translation buffer 24. 

The instruction unit 18 includes a program 
counter 28 and a virtual i n str u ction cache (VIC) 28 
for fetching instructions from the main cache 22. 
The program counter 28 preferably addresses vir- 
tual memory locations rather than the physical 
memory locations of the main memory 14 and the 
cache 22. Thus, the virtual address of the program 
counter 26 must be translated into the physical 
address of the main memory 14 before instructions 
* can be retrieved. Accordingly, the contents of the 
program counter 20 are transferred to the memory 
access unit 10 where the translation buffer 24 per- 
forms the address conversion. The instruction is 
retrieved from its physical memory lo catio n in the 
cache 22 using the converted address. The cache 
22 delivers the instruction over data return lines to 
the VIC 2a 

Generally, the VIC 28 contains prestored 
instructions at the addresses specified by the pro* 
gram counter 20. and the addressed instructions 
are available immediately for the transfer into an 
instruction buffer (1 BUFFER) 30. From the buffer 
30, the addressed instructions are fed to an instruc- 
tion decoder 32 which de co d es both the opcodes 
and the specifiers. An operand processing unit 
(OPU) 34 fetches the specified operands and sup- 
plies them to the execution unit 20. 

The OPU 34 also produces virtual a ddres ses . 



In particular, the OPU 34 produces virtual address- 
es for memory source (read and destination (write) 
operands. For the memory read operands, the OPU 
34 defivsrs these virtual addresses to the memory 

s access unit 16 where they are translated to phys- 
ical addresses. The physical memory loc at io n s of 
the cache 22 are then accessed to fetch the 
operands for the memory source operands. 

In each instruction, the first byte contains the 

to opcode* and the following bytes are the operand 
specifiers to oe oo coooo. ine first oyte or eacn 
specifier in dic a t es the addressing mode for that 
specifier. This byte is usually broken in halves, with 

n kill . - - |j. ,, J, f,i, nnlMM ■■ ■ 1 i ■ M 

ono-iiaa specnying mo BQaressmg moae ana ine 

is other half specifying a register to be used tor 
addressing. The instructions preferably have a vari- 
able length, and various types of specifiers can be 
used with the same o pcode, as disclosed in Strec- 
ker et aL VS. Patent 4241497 issued December 

20 23.1060. 

The first step in processing the instructions Is 
io o o co oo mo opcoos portion or me tnsnuCTon. 
The first portion of each instruction consists of its 
op code which specifies the operation to be per* 

25 formed in the instruction, and the number and type 
of specifiers to be used* Dec odin g ie accomplished 
using a table-look-up technique in the instruction 
d oco der 32. Litter, the execution unit 20 performs 
the specified operation by executing prestored 

so microcode, beginning at a predetermined starting 

nrttjmit mm fn« lii ■ — — -M „ — „ mmm % m m A * trnw . 

aooress tor me s p ecw e o operation. #uso. tne oe* 
coder 32 determines where eourco operand and 
destination-operand specifiers occur in the instruc- 
tion and pas ses these specifiers to the OPU 34 for 

ss preprocessing prior id e x ec ut ion or me tnstrucoort. 
A preferred Instruction decoder for use with the 
renu memoo ana ■ppamus or mo prosorn invsrioon 
is described In the above referenced D. Rte et aJ. 
U.& patent application Serial No. .filed 

to and entitled "Oecodlng Multiple Specifiers in a 
Variable Length Instruction Architecture." incorpo- 
rated herein by reference. 

After an Instruction has been decoded, the 
OPU 34 parses the operand specifiers and com- 

48 putes their effective addresses; this process in- 
volves reading GPRS and possibly modifying the 

/5 jtMlute ft.ii — - mmt til .1 «il l — ii ii . mm m , m 

\Arr% comBms oy auootncrernenung or buxo- 
decrementfng. The operands are then fetched from 
those effective addresses and passed on to the 
so ex e c ut io n unit 20. which executes the Instruction 
ana writes tne result mo me aesenaoori taonunofl 

*- - tmi , J - mm t m mmt * - » ■ - - - . i m - - mt _ _ 

oy me oes on a o on pointer ror max raoucoon. 

Each time an instruction Is p aas ed to the ex- 
ecution unit 20. the instruction unit 18 sends a 
ss microcode dtapatch address and a est of pointers 
for (1) the location In the execution unit register file 
where the source operands can be found* and (2) 
the lo ca ti on where the results are to be stored. 
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Within 018 execution unit 20, a 881 of queues 96 
Includes a fork queue for storing the m i crocode 
dispatch address* a source pointer queue for stoc^ 
ing the source operand locations, and a destination 

- - 1 - * - — * - - -» - ■* - — BaV*^ r , -,| ■ Al . 1 . _ .11. - 

PUHaBv QU 8UB luT SujnnQ mo QUa0n80un KJCoDufu 

Each of these queues to a FIFO buffer capable off 

h_iilrtln-rx Mia r|«j jna- , wlt IH,.| M I— hnyUna 

noxxng me dflBior mufupie tnsvucoona. 

The execution unit 20 also Includes a source 
as, wncn ts a mum-porex] rogttter n» contain- 
ing a copy of the GPR8 and a list of source 
operanoa. inus, emnos in ine source poraer 
queue wffl efthar point to GPR locstfcma for register 
operands* or point to the source (1st tor memory 
and Dtsral operands. Both the memory aocess unit 
16 and the instruction unit 18 write entries in the 
source Bst 38, and the execution unft 20 reads 
operands out of the source Rat 38 aa needed to 
execute the instructions* For executing Instructions* 
the execution unit 20 includ es an Instruction Issue 
writ 40. a microcode exec u tion unit 42, an 
arithmetic end logic unit (ALU) 44> and a retire unft 
46. 

The present invention Is particularly useful wtth 
pipefined processors, Ae cfiscuated above. In a 
pipelined proce sso r, the processor's instruction 
wncn naraware may oe lotcntng one m sDu ea on 
wnsis omer naraware ts oocoumg me operation 
cooo w ■ oocuna tnsuucuon, Teicrong me oporsnos 
off a third in str uction! execu tin g a fourth instruction, 
and storing the processed data of a ftfth instruction. 
RG. 2 Illustrates a pipefne for a typical instruction 
such as: 

ADDL3 R03~12(R1),R2 

i res m a mny worn aaaiuon uamg me aispiaoemem 
mode off addressing. 

In the first stags of the pipe fried execu ti on of 
this instr uct ion, the program count (PC) of the 
instruction n created; tree is usually accompisned 
either by Incrementing the program counter 28 
from the previous i n stru cti on, or by using the target 
address of a branch i n st r uc ti on* The PC Is then 
used to access VIC 28 in the second stags of the 

In the third stage of the pipeline, the instruction 
data to available from the cache 22 tor use by the 
instruction decoder 32, or to be loaded into the 
(BUFFER 30. The instruction decode r 32 decodes 

afr m -*— —J tii ■ ,a, i , , ■ ■■ ■ ii f H - — t _ -t * - 

ine opcod e ana me tnree spec tiers in a single 
cycle, aa wfli be de scrib ed in more detail below. 
The F» and R2 numbers are passed to the ALU 
44, and the R1 number along vjfth the byte die* 
placement la sent to the OPU 34 at the end of the 

In stage tour, the OPU 34 reads the contents off 
Hs QPR register tile at location Hi, adds that value 
to the specified displacement (12), and sends the 
reeuttng address to the translation buffer 24 in the 
memory access unit 16, along with an OP READ 



request, at the end of the address generation 
stage. 

In stage five, the memory aocess unit 16 se- 
lects the address generated In stage four for ex- 
s ecutton. Using the translation buffer 24, the mem- 
ory a c cess unit 16 tia ni l atos the virtual address to 

m. nili mtm mi -* * -- - - ij I aiuk * * - - * - l— .* 

a pnyascai address ourtng ine sooress trcnssBtton 
stage. The physical address is then used to ad- 
dress the cache 22, which is read in stage six of 

io the pipeline. 

fn stage seven of the pipeline, the instruction is 
issued to the ALU 44 which adds the two operands 
and sends the result to the retire unit 48. Ourtng 
stage 4, the register numbers for HI and R2, and a 

rs pointer to the source Bst location for the memory 
data, are sent to the execu ti on unft end stored In 
the pointer queues. Than (taring the cache read 
stage, the execution unit looks tor the two source 
operands In the source Bst. in this particular exam* 

so pie, ft finds only the register data HO, but at the 
end off thla stage the memory data arrives and is 
substituted for the Invalidated read-out of the regie* 
tar fDe. Thus, both operands are available in the 
Instruction execution stage* 

28 In the retire stage eight of the pfpeflne, the 
result data is paired with the next entry to the retire 
queue. Although several functional execution units 
can be busy at the same time, only one instruction 
Is retired In e single cycle. 

so in the last stage nine of the illustrative pipeOne, 
the data Is written into the QPR portion of the 
register files In both the e xec uti o n unit 20 and the 
ItsA U iifcttoA unit 18» 

Referring now to RG. 3, a block diagram of the 

as virtual Instruction cache (VIC) 28 is IBustrated. The 
VIC 28 is constructed of four groups of sstMlmod 
rams (8TRAM8), and acts aa a window into the 
main memory 14. In this regard the VIC 28 func- 
tions to a similar fashion as the main cache 22. The 

40 first group off VIC 8TRAM8 la the date strain 50 

^^L^V aaaBt^BBBM ^faaaaffcA ^M^^bbbbbb^bbV faTl aT bbbVbbbbbbbbi ■ Sat 8 m^m bT^bbbbbbi ■ STB 

wrucn provides storage space ior ine actual instruc- 
tion stream (STREAM) retrieved from the math 
cache 22. Specifically, the data -strain 50 contains 
1024 storage locations, with each storage location 

49 being 644)118 in width. Rom the size off the date 
stram 60, it should be apparent that the (STREAM 
is retrieved In quadword (8-byte) pack et s. Accord* 
ingly. the data path between the main cache 22 
and the ViC 28 is ateo 64-bHa in wkflh and a 

ao quadword off STREAM can be transferred Airing 
each system dock cycle. 

The PC 26 defter* bits 123 of the 324* 
virtual address to the date strain 50 In outer to 
address each quadword of STREAM. Bfts 2*> are 

as unnecessary, aa they are only needed to addLreee 
Individual bytes within each quadword. Individual 
oyte aoaresSsony m run necessary For mo proper 
ope ra tio n of the VIC 28. Rattier, the smallest incre* 
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merit of ISTREAM which can be ad d ressed in the 
VIC 2 8 Is a quadword. Further, the upper bib 
31:13 are not used to address the data stram 50 
because only 1024 quadword locations are avail- 
able for storing the ISTREAM. AGcortfngly. the 10- 
bits 123 are sufficient to provide a unique address 
for each of the 1024 data storage locations {Le. 
2 IO *1024). 

However, it should be dear that since the up- 
per bits 31:13 are not used to address the data 
stram 50. there are multiple quadwords which must 
be stored at identical data stram locations. For 
example, the quadword located at address 
11111111111111111110000000000 wiO be stored 
at the same data stram location as the quadword 
loc ate d at address 

01111111111111111110000000000. Both address- 
es share the same tower 10-bits and must, there- 
fore, share the same data stram storage location. In 
fact each data stram location can host any one of 
1.048570 (21 '» » 1.048578) quadwords. 

Accordingly. In order to determine which of 
theses quadwords is stored in each of the data 
stram locations, a set of tag strama 52 is provided. 
The tag strains 52 store the upper nineteen bits 
31:13 of the quadword address. However. 
ISTREAM is retrieved from the main cache 22 in 
four quadword blocks. In other words, a request to 
the main cache 22 for the first quadword In a block 
causes the main cache 22 to also return the three 
following quadwords. Retrieving ISTREAM in 
blocks satisfies the principle of locality in time and 
memory space and aids the overall performance of 
the VIC 28. Accordingly, the 1024 data stram loca- 
tions are identified by only 288 tag stram locations 
(1 for each four quadword block). Thus, the tag 
stram 52 contains 256 19-bit storage locations and 
8-bits (125) of the virtual address are sufficient to 
identify each of the 256 storage locations 
(2M258). 

Operation of the VIC 28 is enhanced by the 
method used for retrieving ISTREAM from the main 
cache 22. The request for ISTREAM is always 
quadword aligned and can be for any quadword 
within a block. However, the main cache 22 only 
responds with the requested quadword and all sub- 
sequent quadwords to fill the block. Quadwords 
prior to the request in the block are not returned 
from the main cache 22. For example, if the VIC 28 
requests the third quadword In a block, only the 
third and fourth quadwords are returned from the 
main cache 22 and are written into the data stram 
50. This method of retrieving ISTREAM Is em- 
ployed for two reasons. First by returning the 
requested quadword first rather than the first quad- 
word m that block, the requested ISTREAM ad- 
dress is avaftabJe imrnecBatefy and the critical re* 
sponse time is enhanced. Second, performance 



models indicate that the remainder of the block is 
hardly used 

Since it la possible for only a portion of a block 
to be present in the data stram 50, it is necessary 

s to keep track of which quadwords are vafld. There- 
tore, a quadword vafid stram 54 la provided. A vaOd 
bit is maintained for each quadword in the data 
stram 50. The quadword valid stram 54 ie or- 
ganized similar to the tag stram 52. in that it 

io contains 256 4-bit storage locations. Each storage 
location corresponds to a four quadword block of 
data stored in the data stram 50. with each of the 
four vafid bite corr e sponding to a quadword within 
the block. Thus, ike the tag stram 52. the quad* 

rs word vafid stram is addressed by the eight bits 
125 of the virtual address. 

Further, however, the Individual quadword vafid 
bits must also be independently address ab le in 
order to determine if a particular ISTREAM quad- 

20 word requested by the BUFFER 30 is vafid. A 
multiplexer 58 is connected to the 4-bit output of 
the quadword vafid stram 54. The select input of 
the multiplexer 58 la con n ected to quadword Iden- 
tifying bits 4:3 of the virtual address. For example. 

29 a request from the I8UFFER 30 for the quadword 
stored at location 

00000000000000000001111111101000 results in 
the four quadword valid bits stored at location 
11111111 of the quadword vafid stram being defiv- 

» ered to the multiplexer 56. Bits 43 of the virtual 
address indicate that the first quadword (location 
01) is the desired quadword. mis. the select fines 
of the multiplexer 58 cause the quadword vaOd bit 
corresponding to the s e l e ct ed quadword to be de- 

ao Hvered at the multiplexer output 

FInafiy. the fourth group of VIC strama 58 con- 
tains vafid bits for each block stored In the (tola 
stram 50. Thus, the block vafid stram 58 contains 
258 i*bit storage loca ti o n s and is addressed by 

40 bits 125 of the virtual address. Not only is it 
necessary for the VIC 28 to *know" which quad- 
words within a btodc are vafid. but also, the VIC 28 
needs to verify that the btodc Itself is vafid. At this 
time it is sufficient to understand that the block 

46 vafid bit must be set before the VIC 28 wfil allow 
the selected quadword to be transferred to the 
I BUFFER 30. However, it should be noted that the 
block valid stram actuafiy consists of two sets of 
strama to speed operation of the VIC 28 during a 

so flush. At any given time, a selected one of the two 
sets of strama stores the block vafid bits which 
reflect the current status of the data in the VIC 28. 
The addressed block vafid btt. represe n tin g the 
validity of the addressed block of data in the VIC 

sa 28, is selected by a multiplexer 238 as either the 
"BlOCK_A_yAUD- bit torn the first set of 
strama (eat A), or the "BiACK_B_VALJD" bit 
from the second set of strama (set B). TMs aapect 
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of the VIC 28 is d iscusse d in yi outer dstel In 
oonjuncoon won me cxdocnpoon or to operation oi 
the drcuft shown In FKL 9. 

During an (BUFFER request for a detected 
quadword of STREAM, Itwvfctual address con- 
tained in the PC 26 te deRvered to the VIC 2a The 
VIC 28 responds to the request by determining If 
me ray ps oo quaoworu m present m tne oam 
strum 50 and. If eo, whether It la vafid. Bits 31:13 of 
the PC virtual address are delivered to one input of 
a 10 hit comparator 60. The second input to the 
comparator 80 la connected to the output of the tag 
stram SSL Previously, bits 31rl3 of the addrees of 
the quadword stored in the data stram 50 were 
aimed in the tag stram 52. Therefore, those pre- 
viously stored bita 31:13 are presented as the 
second Input to the comparator 80. If the two 
a ddres ses match, the asserted output of the com* 
paretor 60 is dsOvered as one Input to the 3-fnput 
AND gate 62. At the same time, the block and 
quadword vald bits are ateo delivered as inputs to 
the AND gale 82. AccordfogJy. if any of the three 
signals Is not asserted, the AND gate 82 produces 
a MISS signal. Conversely, If afl three signals are 
asserted, the AND gate 82 produces a HIT signal. 
A Mf8S signal initiates a request to the main cache 
22. while a HIT signal causes the dste 8TRAM 50 
oetrver me aetecteg quadword or data. 

The PC 26 la actually constructed of several 
asparate program counters. During each, system 
dock cycle, one of two PCs (PREFETCH PC or 
MTAQ) Is selected and He virtual address Is deliv- 
ered to the VIC 2a Generally, the virtual address 
contained In the PREFETCH PC Is selected and 
dslvsred to the VIC 28. The PREFETCH PC al- 
ways points to the next quadword that the [BUF- 
FER la Only to accept In aaquentlal code the 
PREFETCH PC Is increm ented by one quadword 
each time the SUFFER accepts STREAM from 
the VIC 2 a When the STREAM brmhee. the 
PREFETCH PC Is loaded wfte the correct destina- 
tion address. 

Howe* v, when STREAM Is requested from 
and delivered by the main cache 22. the virtual 
address contained In the MTAQ Is selected and 
delivered to the VIC 2a When the VIC 28 receives 
multiple quadwords of STREAM from the main 
cache 22, the address of the VIC 28 must be 
incremented by a quadword In each cycle of the 
main cache respo ns e. The PREFETCH PC would 
serve this purpose If the i n st r uction decoder 32 
could always consume afl of the STREAM as It 
arrives from the main cache 22. to practice this Is . 
not always possible. Therefore, a second PC, bv 
dspsndsnt from the PREFETCH PC, Is used to 
store the STREAM in the VIC 2a Once the re* 
sp onse fro m the main cache 22 la complete, the 
PREFETCH PC is again used to address the VIC 



2a The MTAQ is loaded wftfi the previous value of 
the VIC address when there la no request to the 
main cache 22. 

Referring now to HQ . 4, the I BUFFER 30 la 

s Rtaetreted. The BUFFER 30 eigne the data tor 
decoding and performs the function of Increasing 
the process ing speed of the instruction unit 18 by 
premcrung s ub s equent sequential instructions, me 
SUFFER 30 retrieves a selected quadword of the 

ro STREAM and positions that quadword. such that 
the instruction decode r 32 receives the instruction 
with ths opcode posi t io n ed In the zero byte loca- 
tion. In order to accomplah this complex teak of 
reposition i ng the STREAM, the SUFFER 30 la 

18 separated into five major functional sections: IBEX 
84 & SEX2 68, ROTATOR 68, SHIFTER 70, 
MERGE MULTIPLEXER 72, and SUF 74. 

Rather than simply increase the size of the 
instruction decoder 32 to contain more bytes of the 

20 STREAM, a pair of prefetching buffers SEX 84 
and SEX2 68 are dtaposed intermediate the de- 
coder 32 and the VIC 2a SEX 64 and IBEX2 68 
are quadword buffers functional po si tioned be- 
tween the VIC 28 and the IBUF 74 and operational 

28 to retrieve the next s eque n tial quadword of 
STREAM while the decoder 32 ie operating on the 
present instruction. This prefetching normally hkfea 
the time required for a VIC access by performing 
the instruction fetch during the flme in which the 
. so decoder 32 Is busy. Any one of the quadworda 
stored In the VIC 28 Is controllabfy storabte In the 
IBEX 64 and IBEX2 8a Aa dlacuased previously, 
the PREFETCH PC controls operation of the VIC 
28 to select and deflver a quadword of STREAM. 

58 The quadword currently sslscted by the 
PREFETCH PC to stored In tfte SEX 64 whBe the 
next subs equen t quadword of STREAM la re- 
trieved from the VIC 28 and stored In the IBEX2 
68. 

40 Thepurpossof'thslBEX84sndlBEX288lsto 
prefetch the subsequent two quadworda of 
STREAM and sequentially provide these bytes of 
STREAM to fil the SUF 74 aa each instrucion la 
consumed by the In st ruction de codsr 32. It should 

48 be noted that the present computer system prefer^ 
ably sm ploys an Instruction set which Is of the 
variable lengft type. Accordingly, until the instruo- 
tion decodsr 32 actually decodes the opcode of the 
in str uction, the number of bytes dedicated to the 

so instant Instruction is not "known" by the SUFFER 
3a Therefore, the SUFFER 30 doee not "know" 
how many bytes wfll be consumed by the instruo 
tion dsood sf 32 and wW need to be refilled by the 
SURFER sa Thus, the logic which controls the 

» operation of the IBEX 64, IBEX2 06. end VIC 28 
must be capable of determining the number of 

tn it»m »- ft** mm |f, , rlem n lir . mm 

oyiss nooooo to ni me oecooer «sc, wmcn weapon 

M mmm* iBUa 1 H - m n n nl.ln Hi n -* * - * h 1 

or rnunpio weapons oormn me aesereo Bytes, ana 
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whether those bytes are valid 

The control logic for operating the IBEX 64. 
IBEX2 66, and VIC 28 includes a multiplexer 76 
with control logic 78 operating the select Inputs of 
the multiplexer 78. The IBEX ^64. IBEX2 68. and 
VTC 28 each includes an 8-byte wide data path 
connected to the inputs of the multiplexer 78 such 
that any input may be selected by the control logic 
78 and delivered over an 8-byte wide data path to 
the rotator 68 and to the IBEX 64. The (BEX2 68 is 
connected drectfy to the VtC 28 and receives the 
next sequential quadword of (STREAM over the 8- 
byte data path therebetween. Operation of the mul- 
tiplexer 78 and control logic 78 is discussed in 
greater detail in conjunction with the description 
accompanying FIGS. 8 and 10. 

The merge multiplexer 72. rotator 68 and shift- 
er 70 interact to maintain the 8-byte instruction 
decoder 32 fitted with the next nine sequential 
bytes of (STREAM. As the decoder 32 completes 
the decoding stage of each instruction, those con- 
sumed bytes are shifted out and discarded by the 
shifter 70. The rotator 68 acts to provide the next 
sequential bytes of (STREAM to replace those 
bytes which were discarded. In this manner, the 
instruction buffer 30 attempts to provide at least 
the next 9-bytes of (STREAM to the instruction 
decoder 32. Therefore, independent of the length 
of the present instruction, the decoder 32 is as- 
sured that for the majority of instructions (relatively 
few instructions require more than 8 bytes) the 
entire instruction is present and available for de- 
coding. 

The IBUF 74 is a 8-byte register for storing the 
results of the merge multiplexer 72 untfl the de- 
coder 32 Is available to accept the (STREAM. Ra- 
ther, the output of the IBUF 74 is also connected to 
the input of the shifter 70. 

Turning now to FIG. 5. the data paths to and 
from the instruction decoder 32 are shown in great- 
er detail. In order to stmultwieousJy decode a num- 
ber of operand specifiers, the IBUF 74 is linked to 
the instruction decoder 32 by a data path 80 for 
conveying the values of up to nine bytes of an 
instruction currently being decoded. Associated 
with the eight bits of each byte is a parity bit tar 
detecting any single bit errors in the byte, and also 
a vaid data flag for indicating whether the IBUF 74 
has. in fact been filled with data from the VTC 28 
as requested by the program counter 2ft 

The instruction decoder 32 decodes a variable 
number of specifiers dependta g upon the particular 
opcode being decoded, the amount of vaid data In 
the IBUF 74. and whether the downstream stages 
in the pipeine are avail** to accept more specifi- 
ers. Specifically, the instruction decoder 32 in- 
spects the opcode to determine the number of 
subsequent bytes which are associated with that 



particular instruction. Then the decoder 32 checks 
the valid data nags to determine how many of the 
associated specifiers that can be decoded and then 
decodes these specifiers in a single cycle. The 

s instruction decoder 32 deOvers a signal fmflcathg 
the number of bytes that were decode d in order to 
remove these bytes from the IBUF 74. For axon- 
pie, if the opcode includes four bytes of associated 
specifiers, the decoder inspects the vaMd bytes to 

10 ensure that these four bytes are vaid and then 
decodes these specifiers. Thereafter, the decoder 
instructs the shifter 70 to remove the opcode and 
the consumed four bytes and move the upper four 
bytes Into the tow order four byte locations. This 

is shifting process is effective to move tie next op- 
code into the zero byte location of the IBUF 74. 

The IBUF 74 need not be large enough to hold 
an entire instruction, so long as it may hold at least 
three specifiers of the kind which are typfcrily 

20 . found in an instruction. The instruction decoder 32 
is somewhat simpMed if the byte 0 position of the 
IBUF 74 holds the opcode while the other bytes of 
the instruction are shifted into and out of the IBUF 
74. In effect the IBUF 74 holds the opcode in byte 

m 0 and functions as a first-ia ffrstout buffer for byte 
positions 1 through & The instruction decoder 32 is 
also simplified by the operating criteria that only 
the specifiers for a single Instruction are decoded 
during each cycle of the system dock. Therefore, 

so at the end of a cycle in which ail of the specifiers 
for an instruction wiD have been decoded, the in- 
struction decoder 32 transmits a "shift opcode" 
signal to the shifter 70 in order to shift the opcode 
out of the byte 0 position of the IBUF 74 so that 

as the next opcode may be received In the byte 0 

The VTC 28 Is preferably arranged to receive 
and transmit instruction data in blocks of multiple 
bytes of data. The block size is preferably a power 

40 of two so that the blocks have memory addresses 
specified by a certain number of most significant 
bits in the address provided by the program coun- 
ter 28. For example, in the preferred embodiment 
each block consists of 32-bytes or four quadwords 

49 and is addressed by a 32-bit address. Thus, bits 
31-5 are unique for each block. Further, owing to 
the instructions being of variable length, the ad- 
dress of the opcodes within the STREAM occur at 
various positions within the block. To load byte 0 of 

so the IBUF 74 with the next opcode to be executed, 
which may occur at any byte position within a 
block of instruction data from the cache, the rotator 
88 is tisposed in the data path from the WC 28 to 
the IBUF 74. The rotator 6a as well as the shifter 

ss 70. are comprised of cross-bar switches. The data 
path from the VIC 28 includes eight paraBei bus- 
ses, one bus being provided tor each byte of the 
STREAM. 



8 



15 



EP0380884 A2 



18 



to the general case, it is necessary to keep 
track of Hie number of vafld bytes h the IBUF 74 
The number of veld bytes at any particular in- 
stance Is kept si e register ceiled IBUF VALID 
COUNT 81. The vatus of ffib register b the pre- 
vious IBUF VALID COUNT minus the number of 
bytes shifted plus the number of new bytes 
merged through MERGE MUX 72. SbnHany It is 
necessary to know how many bytes remain In (BEX 
64. Any bytes that have been moved Into the SUF 
74 are considered InvaBd As IBUF 64 becomes full 
the remaining bytes from the quadword of data or a 
complete new quadword are stored In IBEX The 
number of valid bytes In IBEX Is stored In sMrtusr 
regteter called IBEX VALID COUNT. This is not s 
hardware register but the output from combination- 
al togJc that produces either, the previous BEX 
VALID COUNT minus the number of bytes merged 
into the IBUF 74 H IBEX Is being selected into 
MUX 78, or eight bytes minus the number of bytes 
merged into the IBUF 74 if IBEX 2 or VIC is 
selected Into MUX 78. 

At the beginning of a program or after a branch 
or Jump Instruction Is executed, it is deskebto to 
toed the IBUF 74 with entirely new date from the 
VIC 28. For thta purpose, comblnsMonsI toglo 82 
controfflng the merge multiplexer 72 receives s 
IBUF VALID COUNT of zero ao that aO of the 
setact Ones 80-88 are not asserted and tos merge 
muHplsxsr 72 sstecte data from only toe BO to B8 
inputs. Since none of the inatructfona in the IBUF 
74 are vafld they are discarded, and only the new 
Instructions contained to ROTATOR 88 are pre- 
sented to the IBUF 74 

to order to bad new STREAM Into the IBUF 
74 from the VtC 28, the MERGE MUX 72 Is used 
to setect the number of bytes from the ROTATOR 
68 to be merged with a setect number of bytes 
from the shifter 70. If the signal SHIFT OF Is 
asserted the output of the SHIFTER 70 will be the 
IBUF 74 bytes 0 through 8 shifted down by the 
number to shift, otherwise If 8HIFT OP is not 
asserted the output of the shifter wU be IBUF 74 
byte 0 to position AO with IBUF 74 bytes 1 through 
9 shifted down by the number of bytes to shtfL 

Abo when the IBUF 74 Is intoasy toaded. there 
win be an offset b e tw e en toe address correspond- 
tog to the opcode to the data from VIC 28. In 
particular, this offset is given by the least signifi- 
cant bits of the program counter 26. As shown in 
FIG- 5 a quadword of STREAM (eight bytes) is 
delivered to the ROTATOR 68, thus using tos three 
least significant bite from the program counter 28 . 
as the rotate value the opcode byte Is dsOvered to 
the BO Input of merge mux 72. For example. If the 
program branches toB0018l*,meffimbytoof 
the second quadword to a block. The quadword 
Is BOB 18. ths teast significant thres bib 



are 5. so when tos VK) provides toe quadword the 
ROTATOR 87 rotates by 5 bytes and deflvere byte 
6 to the BO input of MERGE MUX 72. 

to the general ease, though, the rotate value b 

s calculated using ths formula: 

rotate value ■ 8 -BEX VALID COUNT- 
0BUF__VAUD_COUNf~* ~~ 
- NO._BYTES TO SHIFT) 

For exampb, if there are nine vafld bytes in the 

n IBUF 74 and tores to SEX (bytes 8. 6, 7 of a 
quadword) and the number of bytes to shift b two, 
the rotate value b minus two. therefore the rotator 
shifts up by two (as the result was negative). Thus, 
me rotator 68 deSvers byte 5 of the quadword to 

rs IBEX 64 to the B7 input on merge mux 72, end 
byte 6 to B8 (byte 7 b of no interest as It win not 
be merged. It b however, delivered to the BO 
topufL PostaVe rotate values win cause the ROTA- 
TOR 88 to shift down. Thus, co mb i na flunai logic 80 

so controlling the rotator 68 



The control for the MERGE MUX in combin- 
aflonal kjgjc 82 produces Individual setect Ones 80 
- 88 for the merge mux 72 such that the relevant 
as bytes from the 8HJFTER and ROTATOR are deliv- 
ered to toe IBUF 74 ff SHIFT OP b not asserted 
then 80 always selects the AO Input such that the 
opcode byte remains In byte 0 of the SUF 74 The 
remaining selects are calculated as follows: 
as MERGE__VALUE ■ IBUF_VAUD_COUNT - 
NO._BYTE8__TO_SHIFT; any setect (81-88) baa 
man MERGE_VALUE sstecte the SHIFTER 70, 
and toe rest select the ROTATOR 68. 

For exsmpte, if there are eight vafld bytes in 
* the SUF 74 and the number to shift b three, toe 
marge value b five so 81. 82, 83, 84 sabot the 
output from the SHIFTER 70 but 88, 88. 87, 88 
setect the output from the ROTATOR 68. 

Since toe ROTATOR 68 receives eight bytes of 
but transmits nine bytes to the MERGE MUX 
72. the nine bytes detVsred to 80 « B8 Inputs are 
never si vafld. The ninth byte gets the seme data 
as toe first byte but it b only vafld when the rotate 
value b negative. 
«s Once an opcode has bean toaded Into toe byte 
0 posttton of the SUF 74 ths instruction decoder 
32 examines ft and toe other bytes to toe SUF 74 
to determine whether it b possibb to simulta- 
neously decode up to tores operand specifiers, 
ss The Insfructton decoder 32 further separates the 
source operands from tos destination operands, to 
particular, to a stogb cycb of the system dock, the 
Instruction decoder 32 may decode up to two 
source operands and one destination operand. 

aouroe operands or a daw- 
tor each cycb are 
to toa m ftto d from the tostojcfJon decoder 32 to the 
OPU34 
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The instruction decoder 32 simultaneously de- 
codes up to three register specifiers per cycle. 
When a register specifier is decoded, its register 
address is pieced on the transfer bus TO and sent 
to the source Bst queue 38 via rtansfer unit 82 in s 
the OPU 34. 

The instruction decoder 32 may decode one 
short literal specifier per cycle. According to the 
VAX instr uc tion architecture, the short literal speci- 
fier must be a source operand specifier. When the to 
instruction decoder 32 decodes a short iteral 
specifier, the short literal data is transmitted over a 
bus (EX) to an expansion unit 84 in the OPU 34. 

Preferably the instruction decoder 32 is ca- 
pable of decoding one complex specifier per cycle. rs 
The complex specifier data is transmitted by the 
instruction decoder 32 over a general purpose bus 
(QP) to a general purpose unit 86 in the OPU 34. 

Once all of the specifiers for the instruction 
have been decoded, the Instruction decoder 32 zo 
transmits the •shift op" signal to the shifter 70. The 
instruction decoder and also transmits a micropro- 
gram forte- address to a forte queue in the queues 
38. as soon as a vaSd opcode is received by the 
I8UF74. * 

Referring now to FIG. 6, a schematic diagram 
of the shifter 70 is shown. The Ae-A* byte inputs of 
the merge multiplexer 72 are illustrated connected 
to the Writ outputs of a bank of multiplexers which 
comprise the shifter 70. It should be remembered so 
that the purpose of the shifter 70 Is to move the 
unused portion of the instruction stream contained 
in the IBUF 74 into those bytes of the IBUF 74 
which were previously consumed by the instruction 
decoder 32. For example, if. during the previous 35 
cyde. (he instruction decoder 32 used the three 
lowest bytes (0. 1. 2) of the IBUF 74. then In order 
to properly present the next instruction to the de- 
coder 32, it Is preferable to shift the remaining valid 
six bytes (3-6) into the low order sfac bytes of the 40 
IBUF 74. 

Accordingly, the consumed low order bytes are 
no longer of any immedtete use to the decoder 32 
and are discarded. Thus, the shifter 70 need only 
move high order bytes into low order byte positions 45 
and does not rotate the low order bytes into the 
high order byte positions. This requirement simpli- 
fies the shifter configuration for the higher order 
bytes since each byte position only receives shift- 
ed bytes from those positions which are relatively so 
higher. For example, byte position six only receives 
shifted bytes from its two higher order positions (7 
and 8). whils byte position one receives shifted 
bytes from its seven higher order positions (2-8). 

To better describe this process, the internal 50 
configuration of one of the multiplexer banks is 
illustrated and generafty shown at 102. The mul- 
tiplexer bank 102 receives bytes 9. 7, and 8 from 



the IBUF 74 and delivers an output to the As input 
of the merge multiplexer 72. Within the muMptexer 
bank 102 is a group of eight 3-tnput multiplexers 
I02a-I02h. The multiplexer 102a receives the zero 
bit of each of the input bytes & 7. and B at input 
locations 0. 1. and 2 respectively. Smitety. the 
multiplexers I02b-I02h receive bits 1-7 respec- 
tively of the three input bytes. The select Ones for 
each of the multiplexers I02a-I02h is connected to 
the instruction decoder 32 and carries the 3-bit 
signal "number to shift". The "number to shift" 
signal is, of course, the number of bytes that were 
consumed by the instruction decoder 32. 

Therefore. H can be seen that the select lines 
of the multiplexers 1Q2a-102h act to deBver afl 
eight bits of the selected byte. For example, if the 
decoder 32 consumes two bytes of the STREAM, 
then ihe contents of the IBUF 74 are shifted by two 
bytes, such that byte eight is moved into sixth byte 
location. Acooflfingty. the "number to shift" signal 
is set to the value two. thereby selecting the third 
input to the multiplexers 102a-l02h. Thus, the byte 
eight position is selected and deBvered to the 
merge multiplexer input Ac. 

The internal structure of the remaining mul- 
tiplexer banks 104-114 are substantially similar, 
varying only in the number of input bytes. The 
multiplexer bank 114 has an output co nn ected to 
the Ay input of the merge multiplexer 72. The 
Inputs to Ihe multiplexer 114 include only bytes 7 
and 8 of the IBUF 74. The multiplexer bank 112 
has an output connected to the As input of the 
merge multiplexer 72. The inputs to the multiplexer 
112 include bytes 5, 6, 7, and 8 of the IBUF 74. 
The multiplexer bank 110 has an output con ne cted 
to the A* input of the merge multiplexer 72. The 
inputs to the multiplexer 110 Include bytee 4.5,6, 
7, and 8 of the IBUF 74. The multiplexer bank 10B 
has an output con n ected to the At Input of the 
merge multiplexer 72. The inputs to the multiplexer 
106 include bytes 3. 4, 5, 8, 7. and 8 of the IBUF 
74. The multiplexer bank 108 has an output con- 
nected to the Aa input of the merge multiplexer 72. 
The inputs to the multiplexer 108 Include bytes Z 
3, 4, S, 6. 7, and 8 of the IBUF 74. 

The multiplexer bank 104 dffera sfightiy from 
the other multiplexer banks, ki that its output is 
directly connected to the merge multiplexer 72 and 
alao the zero byte position of the IBUF 74. The 
byte aero case is adtftionafly complicated by a 
requirement that in adeftion to the shifter 70 being 
capable of moving any of the higher order bytes 
into the zero byte position, the shifter 70 must also 
be capable of retaining the current zero byte while 
the remaining bytee are shifted. TMa feature is 
desired because byte zero contains the opcode. 
Thus, if the specifiers extend beyond the lengft of 
the IBUF 74. then the consumed bytee must be 
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shifted out and new specifiers rotated in, but the 
opcode must remain urtfl the entire Instruction Is 
decoded. Accortftngty. the Inputs to the muDiplexsr 
104 Include bytes 1. 2. 3. 4. 5. 6. 7. and 8 of the 
®UF 74. However, the output of the multiplexer s 
104 Is defivered lo one Input of a bank of muffiplex- 
era 118. The second input to the multiplexer bank 
118 Is connected to the zero byte position of the 
IBUF 74. A single bit select fine is connected to the 
instruction decoder 32 through an OR gate 118, so to 
that when the instruction decoder 32 issues either 
a "shift opcode" or an *FD shift opcode" sfeial, 
the select Ine Is asserted and the output of the 
mufdptexBr 104 is deftvered to the Ao Input of the 
merge multiplexer 72. Otherwise. If neither of these is 
signals is asserted, then byte 0 is selected and 
dolvered to the Ao Input of the merge multiplexer 
72. 

Retentng now to RQL 7, there Is shown a 
schematic diagram of the rotator 68. The Bo-8s ao 
byte inputs of toe merge mut^exer 72 are Hue- 
trated as connected to the Mrt outputs of a bank 
of multiplexers which comprise the rotator 68 It 
should be remembered that the purpose of the 
rotator 68 is to rotate the next quadword of a 
STREAM so that the merge muttfptaxsr 72 can flB 
the IBUF 74 with the vaBd low order bytes of the 
shifter 70 snd the rotated high order bytes of the 
rotator 68 Further, unflke the shifter (70 to HQ. 5). 
eech of the mufflptaxer banks in the rotator 68 Is so 
capable of delivering arty of the input bytes at Ms 
output 

For example, ft during the previous cycle, the 
instruction decoder 32 uses the three lowest bytes 
(0. 1 , 2) of the IBUF 74, then the shifter 70 moves *e 
the remaining vaM sfe bytes into the tow 
order ste bytes (fXJ) of merge multiplexer inputs 
Ao-At . Thus, the rotator 68 rotates As low order 
three bytes Into posHons 6, 7. and 8 so that ths 
merge multiplexer 72 can combine Ao-Ag and Bt- 40 
Bt to HI the IBUF 74. The low order three bytes 
available from the muftfptexer 78 could be the tow 
order three bytea of IBEX2 68 or the VIC 28 or any 
three consecutive bytes of IBEX 64. 

To better describe this process, ths internal « 
configuration of one of the multiplexer banks is 
illustrated and generally shown at 132. The mu)» 
tfpJaxor bank 132 receives bytes 0-7 from either 
the WC 28, IBEX 64 or IBEX2 68, as described in 
oonlunctton with FI88. 4, 8, and 10. The output of so 
the multiplexer bank 132 is deflvered to the B« 
Input of the merge rnu«plexer 7Z \WWn the mul- 
tiplexer bank 132 Is s group of eight Wnput mul- 
tiplexers 132»-I32h. The muttpiexer 132a receives 
ths am bit of eech of the Input bytes f>7 at as 
multiplexer 132a Input locations 4-3 respectively. 
Similarly, the multiplexers l32tM32h receive bits 
1-7 respectively of aB of the eight input bytes. The 



select Knee tor each of the multiplexers I32a-I32h 
receives Ae 3-bit rotate value as descrtoed to 
conjunction wtth RG. 5. The signal Is, of course, 
toe number of bytes portions that the (STREAM 
should be rotated to property fUl the IBUF 74 

It can be seen that tf the rotate value is se- 
lected to be a value of three by the rotator control 
logic 90, the multiplexers 132a-l32h wfO each se- 
lect the input located at position three. Accordingly, 
bfte r>7 of Input byte seven are selected and 
delivered to the B* input of the merge multiplexer 
72. Therefore, in respo ns e to a request for a three 
byte rotate, the input byte seven is delivered to 
byte pos it ion fou\ 

The remaining multiplexer banks 134-148 are 
substantially similar to the multiplexer bank 132. 
olffertng only in the order in which the input bytes 
are co nn ected to the mufflplexar banks 132-148. 
For example, the same request tor a three byte 
rotate causes multiplexer bank 140 to dslver the 
sixth input byte to byte position three (Bt). 

Consider now the combined affect of the op- 
eration of the rotator 68 and shifter 70. Assume 
both IBUF 74 and IBEX 64 are full. Also assume 
that the decoder 32 has consumed ths tow order 
three bytes of the IBUF 74 The decoder 32 pro- 
duces e value of three aa the "number to shift" 
signal. The shifter 70 responds to this signal by 
relocating the I8TREAM ao that posMona Ao-A* of 
the marge muWpiexer 72 respectively receive posi- 
tions 3, 4, 5. 6. 7, 8, 6, 7, & At toe same time the 
rotator control logic 90 deivers ths rotate value to 
the rotator 6& The rotate value is set to the value 
minus sbc Accordingly, the rotator 68 rotates Its 
contents so thst positions Bo~6i of the merge 
multiplexor 72 respectively receive positions 3, 4> 
5, 6, 7, 8, 0, 1 , 2. Therefore, the merge muftfpJsxsr 
su ccoss fuMy combines the two Inputs to deliver the 
next nine bytes of (STREAM to the IBUF 74 by 
selecting inputs Ao-* and B»-Bg. 

Referring now to FK3, 8» there is shown .a 
schematic diagram of the merge multiplexor 72 
and merge multiplexer control logic 82. tt should be 
remembered that the merge multiplexer 72 op- 
erates under control of the logic 82 to select the 
next nine bytes of STREAM from the two sets of 8 
byte Inputs from the rotator 88 and shfftor 70. 
Generally, the low order bytes are selected from 

the shfftor 70 whfle the rotator 68 tuts the remaining 

»-»-*- - *- - - — - - 
rogn orosr Dyne posroons. 

The control logic 82 receives ths "number to 
shift" signs! (m) and ths IBUF VALID COUNT ml 
usee the values of these signals to select the 
proper input bytes. 

The merge mutoptoxor 72 Includes nine banks 
of muttlplexars 198, 182, 154, 186, 18a 180. 162, 
184, 166 wRh each bank receiving two byte posh 
don inputs, one byte eech from the rotator 68 and 
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shifter 70. Thus, the select line connected to each 
bank of multiplexers is asserted to select the rota- 
tor input and unasserted to select the shifter Input 

To better describe this process, the internal 
configuration of one of the multiplexer banks Is 
illustrated and generally shown at 150. The mul- 
tiplexer bank 150 receives bits 0-7 from the zero 
byte position of both the shifter 70 (Aoo-Ao/) and 
rotator 68 (Boo-Bo?). The output of the multiplexer 
bank ISO is delivered to the zero byte position of 
the IBUF 74. Contained within the multiplexer bank 
150 is a group of eight 2-tnput multiplexers 150a- 
I50h. The multiplexer 150a receives the zero bit of 
both of the zero position input bytes such that ai 
asserted value on the select fine delivers Boo and 
an unasserted value deSvers Aoo. Similarly, the 
multiplexers 150tH50h receive bits 1-7 respec- 
tively of both of the input bytes. The select Hnes tor 
each of the multiplexers I50a-150h receives a 1-bit 
select stgrtai from the priority decoder 82 in order 
to commonly defter all eight bits of the selected 
byte to the zero input position of the IBUF 74. 

Within the control logic 82. the "number to 
shift" signal (m) is subtracted from the IBUF VAXJD 
COUNT to determine the lowest order byte position 
into which the rotator inputs should be delivered. 
The signal m is delivered to a Is complement 
generator 168 to convert the signal m into a nega- 
te value. The signal -m is delivered to an adder 
170 which performs the arithmetic operation and 
delivers the result to a 4:16 decoder 172. Accord- 
ingly, the lower order nine output bits of the de- 
coder produce a* single asserted signal at the 
numeric position co r re sp o ndin g to the lowest order 
byte position into which the rotator inputs should 
be delivered. Therefore, this asserted byte position 
and all higher order byte positions should be as- 
serted to property select rotator inputs at the cor- 
responcfng muftipknasrs. 

For example, as discussed previously, if the 
"number to shift" signal Is set to a value of three, 
then the rotator inputs should be selected for byte 
positions 6 through 8. The output of the decoder 
172 asserts only the fine corre sp onding to byte 
position 8. Titus, a bank of OR gates 174 are 
con n ected to the outputs of me decoder 172 to 
provide asserted signals to the multiplexers cor- 
responding to the asserted line and afl higher order 
byte posi t ions. 

During normal operation the "number to shift" 
signal controls the operation of the merge mul- 
tiplexer 72. However, at the beginning of e program 
or at a context switch, the "number to shift" signal 
is zero and the IBUF VALID COUNT is zero and 
the entire contents of the rotator 88 are loaded into 
the IBUF 74. Therefore, the output of the adder 170 
is zero, enabling ail of the outputs of the bank of 
OR gates 82. Thus, the select Knee to the mul- 



tiplexers 150-168 ail act to select the B inputs and 
pass the entire contents of the rotator to the IBUF 

74. 

The control logic 78 tor operating the mut- 
s tiplexer 76 of FIG. 4 selects either (BEX 84. IBEX2 
68 or VIC 28 according to the following priority 
scheme. 

The control logic 78 selects IBEX 84, IBEX288 
or VIC 28 with a simple priority algorithm. If IBEX 
re is not empty then IBEX 64 is delivered to the 
ROTATOR 68 otherwise if IBEX2 is vaid it is 
defivered to the rotation 88 and if both IBEX is 
empty and IBEX2 is not vafid VIC data is delivered 
to the ROTATOR 68. 
is IBEX is loaded each cycle with the data deliv- 
ered by MUX 78 but it Is marked empty either on a 
FLUSH or when aU vaRd data on the ROTATOR 88 
is consumed by the IBUF 74. (n other words. IBEX 
VALID COUNT becomes non-zero when MUX 76 
» provides data to ROTATOR 68 that cannot find a 
place in IBUF 74. For example, after a branch or 
lump instruction has been executed IBUF 74, IBEX 
64 and IBEX 2 are cleared (FLUSHED) and the VIC 
Is accessed for the new ISTREAM. Assume It 
* branches to the first byte of a block that is In the 
VIC 28. The first quadword from the VIC 28 la 
presented to MUX 76 (Ms passes the data through 
the ROTATOR 68 and MERGE MUX to IBUF 74. 
IBEX is loaded with the data but is not marked 
so vafid as an eight bytes went into the IBUF 74. In 
the following cycle the VIC 28 presents the second 
quadword to MUX 78 which passes it to the ROTA- 
TOR 68. Now assuming the DECODER 32 decodes 
less than eight bytes, say four bytes, the SHIFTER 
as 70 shifts out 4 bytes, the ROTATOR 88 rotates by 
four and the MERGE MUX 82 passes four bytes 
from the shifter 70 and five bytes from the ROTA- 
TOR 68 then IBEX contains three unused bytes of 
ISTREAM. so IBEX VAUD COUNT Is set to three. 
40 BEX2 can be considered stafl buffer tor the 
VIC 28. Because of the pipelined nature of creating 
a new prefetch address, accessin g the VIC stroma 
then checking for a VIC HIT it is impractical to stop 
this process as soon as IBEX contains some vafid 
48 bytes. Thus date from the VIC 28 Is loaded into 
IBEX2 68 the cycle after IBEX 84 is loaded with 
some vafid data and IBEX2 88 is marked vaOd if It 
is a VIC HIT. Taking the above example, where a 
branch to the first byte of a vaid block in the VIC 
so 28 is executed. The a ddress of the first quadword 
is moved to PREFETCH PC in the first cycle. In 
the second cycle the fast quadword is delivered to 
IBUF 74 and PREFETCH PC moves on to the 
second quadword. In the third cycle, the second 
* quadword la defivered to IBUF 74 and IBEX 64 and 
the PREFETCH PC movea to the third quadword. 
In the fourth cycle, assuming DECODER 32 con- 
sumes no more bytes, the third quadword is deflv 
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ered to IBEX2 and PREFETCH PC moves to the 
four*, quachvord md 15 we dackte to rial. In the 
fl«h cycle the VIC 28 deOvere the fourth quadword 
to MUX 78 but IBEX 84 date is passed to the 
ROTATOR 68. ^ B 

As can be seen in the above example, 
prefetching of STREAM can move sig ni ficantly 
ahead of the instruction in the BUR One benefit of 
the VIC 28 Is that accesses to the main cache 22 
are significant* reduced. However, title benefit wtB to 
be severely reduced If prefetching continues too far 
ahead of the decoded Instruction stream. On aver* 
age, a branch Instruction occurs once In every 
sixteen bytes of STREAM so it Is essential that 
P««rf^(toes not access the main cache 22 rs 

be used. Thus, a request to the rnabi cache tor 
data is only made if there Is a VIC MI88, LBEX2 Is 
not vaid and IBEX Is empty. TNa usuaBy means 
seven or eight bytes are atffl available to the OE- 20 
CODER 32 whan the request for a VC block is 
made* 

Referring now to RQ. 9, there Is shown a Node 
diagram of the twowit vaid btocfc store atram 58 
of the virtual Instruction cache 2a 8tnce the VIC 28 as 
is a virtual cache, It must be flushed on a context 
switch or RB Instruction. In other words, a* 288 of 
toe 1-bit storage locations must be mated aa 
Invalid. Unfortunately, only one storage location can 
be merited aa invalid during eech clock cycle, so 
Accordingly, It is possible that if afl 258 bto are eat 
to their vaid conrJBon, then it tafcee 288 dock 
cyclee to dear the bbek vaid atram 5& 

Ae shown In RG. 8, there are too Mock vaid 
strams 220, 222(BV8A, BVS8). One of the strama as 
la used to determine IT 0* prooontfy requested 
address •hfte* or "mteees" 61 the VIC 28. While the 
first atram la determining hIVMse the second atram 
la being cleared at the rate of one storage location 
during each dock cycle. Therefore, assuming that m 
258 cycles have elapsed since the last context 
switch, then the second etram la deer and a con* 
text switch Is eocompOshed fat only a single cycle 
by switching the functions of the two strams. It 
should be appreciated that each Strom 220, 222 Is 49 
configured to perform either Nt/rnlsa determination 
or valid bit clearing, in fact; each context switch 
causes BV8A and BV86 to switch to the opposite 
function. 

BVSA and BVSB each receive a afngfe 8-btt ae 
address from respective muMptaocers 224, 228. 
Both of the multiplexers 224, 228 receive a pair of 
addressee from the PC 28 and a roast control 228. 
to order to present the PC address to one of the 
strams 220. 222 and the reset address to the other as 
stram 220. 222, the select Ines to the multiplexers 
224, 228 are operated In a complementary faaNon. 

The reset control 228 reoetVee a CONTEXT 



SWITCH signal from the execution unit 20 and 
begins to sequenrJaUy present address 0-2S5 to the 
muMptaaere 224, 22* One of toe multiplexers 224. 
228 passes these sequential addresses to the se- 
lected strams 220.222.suchthatthe2Wvaidbrt8 
contained therein are reset over a period of 298 
clock cycles. 

In order to prevent the execution unit from 
Initiating a context switch before one of the strama 
220, 222 to reset the reset control defivers a hand- 
shaking signal to Incficate that the reset process Is 
complete. An 8-R fBp flop 230 receives the hand- 
shaking signal at te set input, causing toe Hp flop 
230 to latch a PROCEED WITH CONTEXT 
SWITCH SIGNAL to the execution unit 20. The 
SWITCH CONTEXT signal from the execution unit 
20 Is also connected to toe reset Input of the ffip 
flop 230 so that the PROCEED WITH CONTEXT 
SWITCH signal to reset at the beginning of eech 
context switch. 

Control of the select Ines to the multiplexers 
224. 226 Is provided by a J-K ffip flop 232 wHch 

sponse to eech CONTEXT 8WTTCH etgnsi. Both 
Inputs of the ffip flop 232 are connected to e logfeel 
"1" and the dock input to connected to the CON- 
TEXT SWITCH slQnsl. Thus, the Q output (U8E 
BLOCK B) of the ffip-flop 232 switches between 
"0" end "1" h response to e transition In the 
8WTTCH CONTEXT signal The select input of the 
mufflptexer 224 Is connected drectry to toe Q 
output of the ffip-flop 232. while the select input of 
the multiplexer 226 is connected to the Q output of 
the fip-flop 232 through an Inverter 234. 

in s similar fashion the block vafid data 
(MARKER BLOCK VALID) from the PC unit (26 In 
FK3. 1) is multiplexed b et we e n the data Inputs of 
the atoms 220, 222 in response to the USE 
BLOCK B 8K3NAL For this purpose, the data input 
of the "8" etram 222 la connected to the MARKER 
BLOCK VALID fine through an AND gate 237 which 
Is enabled by the USE BLOCK B signet, end the 
data Input of the "A" stram 220 to connected to the 
MARKER BLOCK VALID she through an AND gate 
enabled by the complement of the USE BLOCK B 
signal aa provided by en Inverter 239. Therefore, 
when the USE BLOCK B signal Is asserted, the 
MARKER BLOCK VALID data is tod Into the *B" 
stram 222 whHe the "A" etram receives zero data 
and la therefore cleared. Conversely, when the 
USE BLOCK B signal Is not asserted, the MARKER 
BLOCK VALID data is fed into the "A" stram 222 
while the "B- atom receives zero data and is 
therefore cleared. 

RnaBy. the vaid M outputs of the strams 220, 
222 are co nn ected to e pek of inputs to s mul- 
tfptexer 238. Hie eetoct few of (he multiplexer 236 
Is eJao connected to the Q output cf the ffip flop 
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232 to operate in conjunction with the multiplexers 
224. 226. Accordingly, the strain 220, 222 which is 
selected to receive the PC addr e s s is also selected 
to deliver Hs output as the BLOCK VALID BIT. 



Claims 

1. An instruction buffer system for a digital 
computer for controlling the defivery of instruction 
stream bytes to an instruction decode r (32) capable 
of simultaneously decoding a variable number of 
instruction bytes; the system being characterised 
by: 

(1) an instruction buffer (74) having multiple 
byte locations for receiving the Instruction bytes to 
be decoded; 

(2) first (64) and second (66) prefetch buffers 
for storing a preselected number of su b sequent 
bytes of the instruction stream; 

(3) means (72) tor refilling the instruction 
buffer (74) with a selected number of sequential 
bytes of the instruction stream retrieved from at 
least one of the first (64) and second (66) prefetch 
buffers; 

(4) means for refitting the first prefetch buffer 
(64) with sequential bytes of the instruction stream 

A, tAhJh Am* n ^ntm tnh mm* ill. ■ * — m imi ri if 1 1 1 1 ■ ■ mi 

wnen me nrsi prereicn ouner is ernpneo; ana 

(5) means tor refilling the second prefetch 
buffer (66) with sequential bytes of the instruction 
stream when the second prefetch buffer Is emptied. 

2. An instruction buffer system, as claimed in 
Claim 1. including means for delivering a signal 
responsive to the number of bytes of instruction 
stream contained in an instruction currently being 
decoded, and further inctoding a .shifter (70) for 

1- _* afc — ><t ij _ i 1 — -a - »- «^» - - >fc _ _ _ > - - » - 

receiving me saw 5*{jnaj ano smiung mo concents 
of the instruction buffer (74) by a number of bytes 
re pr esentative of the said number. 

3. An i ns truction buffer system, as claimed in 
Claim 1 or Claim 2 wherein the instruction buffer 
refining means (72) in cl ud es means for retrieving 
sequential bytes of the instruction stream from one 
of the first (64) and second (66) prefetch buffers to 
fill the buffer locat i ons from which ins t ruc ti on 
stream bytes have been removed. 

4. An instruction buffer system, as claimed In 
Claim 2 or Claim 3, wherein the instruction buffer 
refiling mesne (72) includes means (66) for receiv- 
ing the seque n tial bytes of instruction stream r^ 
trfeved from the first (84) and second (68) prefetch 
buffers and rotating the bytes by a p reselecte d 
number of byte locations responsive to the number 
of bytes indicated by the said signal 

5. An instruction buffer system, as claimed In 
any one of Claims 2 to 4, wherein the means tor 
refilling the first (84) and second (66) prefetch 
buffers Is operative In re spo ns e to the absence of 



the said signal. 

6. An instruction buffer system, as claimed in 
any one of the preceding claims including means 
for receiving the instruction stream from a virtual 

s i n st r uction cache (28) in response to both of the 
first (84) and second (88) prefetch buffers having 
been emptied of instruction stream bytes* 

7. An instruction buffer system for a digital 
computer for controlling the delivery of an Instruc* 

to tion stream to an instruction decoder (32) capable 
of simultaneously decodng a variable number of 
instruction bytes, the de co d er (32) having means 
for delvering a signal responsive to the number of 
oytes or instruction stream oemg oecoooa, me 

rs system being characterised by? 

(1) an instruction buffer (74) for maintaining a 
preselected number of the next required s eq u e n tial 
bytes of instruction stream, and means for de- 

Hi i n at am m Ik* n att mi — * - - A - at — — — — - * I ,-. -| r . , n J , . -, 

nvermg me sua preserecteo numoer or instruction 
so stream bytes to the decoder 

(2) first means for prefetching and maintain* 
mg In a first prefetch buffer (64) a preselected 
number of sequential bytes of the Instruction 
stream; 

38 (3) second means tor prefetching and mairh 

taining m a second prefetch buffer (86) a preselec- 
ted number of the next sequential bytes of the 
instruction stream subsequent to the bytes of to* 
struction stream maintained in the first prefetch 

30 buffer (84); 

(4) a shifter for receiving the said signal and 

■Itlfllnj. Ik* nil nl Bill* mI Ik* !_ml_it-ntl-ruL In iffru tTA\ 

snirnng me contents or tne ms oucoo n ouner \(+) 
by a pres el e c ted number of bytes responsive to 
the said signal, and delivering the shifted bytes to 
38 the instruction buffer; 

(5) means tor retrieving sequential bytes of 
the instruction stream from one of the first 04) and 
second (88) prefetch buffers to fill the instruction 

t-l, iff— ■ tat ** » 1, I nk *- . — — — * |k» I i .1 nllnn 

ouner pcasons rrom wrncn oytes or tne ms eu cD on 
40 stream have been removed; 

(6) means tor refilling the first (64) prefetch 
buffer with s u bsequent i nstr uct io n stream bytes In 
response to the first prefetch buffers (64) being 
emptied; and 

48 (7) means for refilling the second (68) 

nrnfiiiinii *- ** - - Till — ■ , n n | t, , -t - , ,-.1 „ „ ■ ■» - - 

prereicn ouner witn suo se que m instruction stream 
bytes in response to the second prefetch buffer 
(86) being emptied. 

8. An instruction buffer system, as claimed In 
so Claim 7, wherein the ins t ruction buffer refilling 

means memoes means tot recsivtng me soquenosJ 
bytes of instruction stream retrieved from the first 
io*l ano ssoon o ( ooj preietcn Durrers ano rotating 
the bytes by a preselected number of byte toe* 
as dons in response to the said slgniL 

9. An instruction buffer system, as claimed in 
Claim 7 or Claim 8 wherein the means for refilling 
the first (94) and second (88) prefetch buffers are 
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operative in response to the absence of the said 

■ - - ■ 

StQnSL 

10. An Instruction buffer system* as cWmed In 
any one of Claims 7 to 9, wherein the means for 
retrieving to fU the instructioiT buffer (74) i ncl udes s 
means tor retrieving bytes of the instruction stream 
from a cache memory (28) in r esp o nse to the first 
(84) and second (06) prefetch buffers having been 
empsscL 

11. A virtual instruction instruction cache (28) ro 
for a ifl^Ud computer arranged to store a selected 
poiuon or an tnavucoon stream inorwi ana oemg 
adi^tsd to replace the said odocted portion wltii 
another portion of the i n struction stream in re* 
sponss d a coresxi swncn, me vunuai msvucoon is 
cache being organised Into blocks of a preselected 
number of bytes of the In s t ruct ion stream, each of 

me DtoocB navmg a asoc wB O merewnn a vasa on 
which Is set to indicate that at least a portion of the 
umuiMMi uream oyies snreo m mat duck are 20 
vaft* ctaactsriaed In that the vaSd Ma are or- 
g a n faad and stored in a vald bit RAM which conv 
prtaoK 

(1) first snd second vafid bit stores (220, 
222). each having a p re s elect ed number of storage as 

loc a t i ons} 

(2) means (228, 224) tor delverfng a 

rwvMAlafw^fnrf oHrinM fe% /wMb tti Mia tinMii 

■ff ^^O^yi^^^4^FW vUWQv ^^^^ %»re 10^9 VaUWI VH WM^W| 

(3) means (238) for reotovlng the vaHd bit 

(4) meant for resetting all of the vaOd Mta * 
stored in the other vald bit stores 

(5) means tor alternately s e l ect in g the (fret 
valid bit store and the second vafid bit store to be 

the said one vaHd bit store in response to a context ss 
awncn. 

12. A viftusi instruction cache, as debited In 
Claim 11* wherein the resetting means (228) in* 
c h ides means tor delivering a reeot signal In re* 
sponss to 80 of the storage locations being reset 40 
and means tor delaying the context switch In re- 
sponse to the absence of the reset signal 

13. A virtual instruction cachet as claimed to 
Claim 11 or Claim 12, wherein the means tor 
alternately selecting includes first (224) snd second 4a 
(228) multiplexer! each having first and second 
inputs r es pe cti vely connected to the preselected 
address defvering means and the resetting means 
(228), an output connected to the first and second 
valid bit stores, the first muitipiexer (224) having a so 
select Input connected to means (239 ft* after- 
nsteiy cycflng between sn asserted snd unasserted 
state in response to a context switch* and the 
second multipl exer (228) having a select input con* 
necoKj uHvupi an rnvener v me ansmaieiy as 
cycSng means (232)* 

Km * J * * -* .nlfnn *- -* - * J |— 

a vbtusj tvuucDon cscne. as aannoa tn 

Claims 12 or CWm 18. wherein, the means tor 



aftsmstely sel e ct i n g Includes an output mu lt lplSMBr 
having fbet and second inputs re s pect iv ely con- 
nected to the outputs of the first and second vsfid 
bit stores (228; 224). an output and a select input 
connected to Vie means (232) tor a&Bmstety cy- 
cflng between an asserted and unasserted state to 
response to a context switch. 
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